18 research outputs found

    On the Reliability of Machine Learning Models for Survival Analysis When Cure Is a Possibility

    Get PDF
    [Abstract]: In classical survival analysis, it is assumed that all the individuals will experience the event of interest. However, if there is a proportion of subjects who will never experience the event, then a standard survival approach is not appropriate, and cure models should be considered instead. This paper deals with the problem of adapting a machine learning approach for classical survival analysis to a situation when cure (i.e., not suffering the event) is a possibility. Specifically, a brief review of cure models and recent machine learning methodologies is presented, and an adaptation of machine learning approaches to account for cured individuals is introduced. In order to validate the proposed methods, we present an extensive simulation study in which we compare the performance of the adapted machine learning algorithms with existing cure models. The results show the good behavior of the semiparametric or the nonparametric approaches, depending on the simulated scenario. The practical utility of the methodology is showcased through two real-world dataset illustrations. In the first one, the results show the gain of using the nonparametric mixture cure model approach. In the second example, the results show the poor performance of some machine learning methods for small sample sizes.This project was funded by the Xunta de Galicia (Axencia Galega de Innovación) Research projects COVID-19 presented in ISCIII IN845D 2020/26, Operational Program FEDER Galicia 2014–2020; by the Centro de Investigación de Galicia “CITIC”, funded by Xunta de Galicia and the European Union European Regional Development Fund (ERDF)-Galicia 2014–2020 Program, by grant ED431G 2019/01; and by the Spanish Ministerio de Economía y Competitividad (research projects PID2019-109238GB-C22 and PID2021-128045OA-I00). ALC was sponsored by the BEATRIZ GALINDO JUNIOR Spanish Grant from MICINN (Ministerio de Ciencia e Innovación) with code BGP18/00154. ALC was partially supported by the MICINN Grant PID2020-113578RB-I00 and partial support of Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14Xunta de Galicia; IN845D 2020/2

    Nonparametric Inference in Mixture Cure Models

    Get PDF
    Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Abstract] A completely nonparametric method for the estimation of mixture cure models is proposed. An incidence estimator is extensively studied and a latency estimator is presented. These estimators, which are based on the Beran estimator of the conditional survival function, are proven to be the local maximum likelihood estimators. Two i.i.d. representations for the incidence and the latency estimators are obtained. Moreover, an asymptotic expression for the mean squared error of the latency estimator is derived, and its asymptotic normality is proven. In addition, bootstrap bandwidth selection methods for each nonparametric estimator are introduced. The proposed nonparametric estimators are compared with existing semiparametric approaches in simulation studies, in which the performance of the bootstrap bandwidth selectors are also assessed. The nonparametric incidence and latency estimators are applied to a dataset of colorectal cancer patients from the University Hospital of A Coruña (CHUAC). Furthermore, a nonparametric covariate significance test for the incidence is proposed. The method is extended to non continuous covariates: binary, discrete and qualitative, and also to contexts with a large number of covariates. The efficiency of the procedure is evaluated in a Monte Carlo simulation study, in which the distribution of the test is approximated by bootstrap. The test is applied to a sarcomas dataset.[Resumen] Se propone un método completamente no paramétrico para la estimación de modelos de curación de tipo mixtura. Se estudia ampliamente un estimador para la incidencia y se presenta un estimador para la latencia. Se demuestra que estos estimadores, basados en el estimador de Beran de la función de supervivencia condicional, son los estimadores máximo verosímiles locales. Se obtienen representaciones i.i.d. de los estimadores de la incidencia y de la latencia. Además, se halla una expresión asintótica para el eITor cuadrático medio del estimador de la latencia y se demuestra su normalidad asintótica. También se presentan métodos de selección de la ventana, de tipo bootstrap, para cada estimador no paramétrico. Los estimadores no paramétricos propuestos se comparan con otros enfoques semiparamétricos existentes en la literatura en estudios de simulación, en donde también se evalúa el comportamiento de los selectores de la ventana. Los estimadores no paramétricos de la incidencia y la latencia se aplican a una base de datos de pacientes de cáncer colorrectal del Complejo Hospitalario Universitario de A Coruña (CHUAC). Además, se propone un test no paramétrico de significación de covariables. El método se extiende a covariables no continuas: binarias, discretas y cualitativas, y también a contextos con un gran número de covariables. Se evalúa su eficiencia en un estudio de simulación de Monte Carla, en el cual la distribución del test es aproximada por bootstrap. Se aplica el método a una base de datos de pacientes con sarcomas.[Resumo] Proponse nn método completamente non paramétrico para a estimación de modelos de curación de tipo mixtura. Estúdase ampliamente nn estimador para a incidencia e preséntase nn estimador para a latencia. Demóstrase que estes estimadores, baseados no estimador de Beran da función de supervivencia condicional, son os estimadores máximo verosímiles locais. Obtéñense representacións i.i.d. dos estimadores da incidencia e da latencia. Ademais, áchase unha expresión asintótica para o erro cadrático medio do estimador da latencia e demóstrase a súa normalidade asintótica. Tamén se presentan métodos de selección da ventá, de tipo bootstrap, para cada estimador non paramétrico. Compáranse os estimadores non paramétricos propostos con outros enfoques semiparamétricos existentes na literatura en estudos de simulación, onde tamén se avalía o comportamento dos selectores da ventá. Aplícanse os estimadores non paramétricos da incidencia e da latencia a unha base de datos de doentes de cancro colorrectal do Complexo Hospitalario Universitario de A Coruña (CHUAC). Ademais, proponse un test non paramétrico de significación de covariables. O método é extendido a covariables non continuas: binarias, discretas e cualitativas, e tamén a contextos cun gran número de covariables. Avalíase a súa eficiencia nun estudo de simulación de Monte Carla, no que a distribución do test é aproximada por bootstrap. Aplícase o método a unha base de datos de doentes con sarcomas

    Cure models to estimate time until hospitalization due to COVID-19

    Get PDF
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10489-021-02311-8[Abstract]: A short introduction to survival analysis and censored data is included in this paper. A thorough literature review in the field of cure models has been done. An overview on the most important and recent approaches on parametric, semiparametric and nonparametric mixture cure models is also included. The main nonparametric and semiparametric approaches were applied to a real time dataset of COVID-19 patients from the first weeks of the epidemic in Galicia (NW Spain). The aim is to model the elapsed time from diagnosis to hospital admission. The main conclusions, as well as the limitations of both the cure models and the dataset, are presented, illustrating the usefulness of cure models in this kind of studies, where the influence of age and sex on the time to hospital admission is shown.MPL activity was funded by the Science, Technology, and Innovation Plan of the Principality of Asturias (Spain) Ref: FC-GRUPIN-IDI/2018/000225, which is part-funded by the European Regional Development Fund (ERDF). ALC was sponsored by the BEATRIZ GALINDO JUNIOR Spanish Grant from MICINN (Ministerio de Ciencia, Innovación y Universidades) with reference BGP18/00154. RC and ALC acknowledge partial support by the MINECO grant MTM2017-82724-R, and by the Xunta de Galicia: Grupos de Referencia Competitiva ED431C-2020-14, Centro de Investigación del Sistema universitario de Galicia ED431G 2019/01, and Axencia Galega de Innovación (Ayudas proyectos de investigación COVID-19 presentados a la convocatoria del ISCIII IN845D 2020/26 - Programa Operativo FEDER Galicia 2014-2020), all of them through the ERDF.Gobierno del Principado de Asturias; FC-GRUPIN-IDI/2018/000225Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0

    npcure: An R Package for Nonparametric Inference in Mixture Cure Models

    Get PDF
    [Abstract] Mixture cure models have been widely used to analyze survival data with a cure fraction. They assume that a subgroup of the individuals under study will never experience the event (cured subjects). So, the goal is twofold: to study both the cure probability and the failure time of the uncured individuals through a proper survival function (latency). The R package npcure implements a completely nonparametric approach for estimating these functions in mixture cure models, considering right-censored survival times. Nonparametric estimators for the cure probability and the latency as functions of a covariate are provided. Bootstrap bandwidth selectors for the estimators are included. The package also implements a nonparametric covariate significance test for the cure probability, which can be applied with a continuous, discrete, or qualitative covariate. © 2021. All Rights Reserved.The first author’s research was sponsored by the Beatriz Galindo Junior Spanish Grant from Ministerio de Ciencia, Innovación y Universidades (MICINN) with reference BGP18/00154. All the authors acknowledge partial support by the MICINN Grant MTM2017-82724-R (EU ERDF support included), and by Xunta de Galicia (Centro Singular de Investigación de Galicia accreditation ED431G/01 2016-2019 and Grupos de Referencia Competitiva CN2012/130 and ED431C2016-015) and the European Union (European Regional Development Fund - ERDF)Xunta de Galicia; ED431G/01 2016-2019Xunta de Galicia; CN2012/130Xunta de Galicia; ED431C2016-01

    Nonparametric latency estimation for mixture cure models

    Get PDF
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11749-016-0515-1[Abstract]: A nonparametric latency estimator for mixture cure models is studied in this paper. An i.i.d. representation is obtained, the asymptotic mean squared error of the latency estimator is found, and its asymptotic normality is proven. A bootstrap bandwidth selection method is introduced and its efficiency is evaluated in a simulation study. The proposed methods are applied to a dataset of colorectal cancer patients in the University Hospital of A Coruña (CHUAC).The first author’s research was sponsored by the Spanish FPU (Formación de Profesorado Universitario) Grant from MECD (Ministerio de Educación, Cultura y Deporte) with reference FPU13/01371. All the authors acknowledge partial support by the MINECO (Ministerio de Economía y Competitividad) grant MTM2014-52876-R (EU ERDF support included), the MICINN (Ministerio de Ciencia e Innovación) Grant MTM2011-22392 (EU ERDF support included) and Xunta de Galicia GRC Grant CN2012/130. The authors are grateful to Dr. Sonia Pértega and Dr. Salvador Pita, at the University Hospital of A Coruña, for providing the colorectal cancer data set.Xunta de Galicia; CN2012/13

    How do early socioeconomic circumstances impact inflammatory trajectories? Findings from Generation XXI

    Get PDF
    © 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article [S. Soares, A. López-Cheda, A. C. Santos, H. Barros, y S. Fraga, «How do early socioeconomic circumstances impact inflammatory trajectories? Findings from Generation XXI», Psychoneuroendocrinology, vol. 119, p. 104755, sep. 2020] has been accepted for publication in Psychoneuroendocrinology. The Version of Record is available online at https://doi.org/10.1016/j.psyneuen.2020.104755.[Abstract]: Background: The association between socioeconomic position and markers of inflammation in adults, including C-reactive protein (CRP), is well-established. We hypothesized that children from families of less-advantaged socioeconomic circumstances may be at higher inflammatory risk during childhood and, consequently, throughout their life course. Thus, we aimed to investigate whether early socioeconomic circumstances impact CRP trajectories using repeated measures of data from a population-based birth cohort. Methods: Data from 2510 participants of Generation XXI, a prospective Portuguese population-based birth cohort, were included in this study. Early socioeconomic circumstances comprised maternal education and occupation, paternal education and occupation, and household income at the child’s birth. Venous blood samples were collected from the children at ages four, seven, and ten years, and high-sensitivity CRP (Hs-CRP) was quantified. Hs-CRP trajectories were computed using a linear mixed-model approach. Results: Participants from less-advantaged socioeconomic circumstances presented higher levels of Hs-CRP by age of ten years. The higher the mother´s education and disposable household income, the lower the minimum value of the log Hs-CRP observed throughout childhood. Further, the age at which that minimum log Hs-CRP value was reached occurs later, meaning that children born in more-advantaged socioeconomic circumstances had lower levels of log Hs-CRP compared with children from less-advantaged families. Conclusions: Poor socioeconomic circumstances early in life are associated with increased inflammation levels throughout the first decade of life. This study demonstrates that social inequalities may impact population health beginning at very early ages.This work was supported by the European Regional Development Fund through the Operational Programme Competitiveness and Internationalization and national funding from the Foundation for Science and Technology (FCT), Portuguese Ministry of Science, Technology, and Higher Education under the projects “BioAdversity: How childhood social adversity shapes health: The biology of social adversity” (POCI-01- 0145-FEDER-016838; reference FCT PTDC/DTP-EPI/1687/2014), “HIneC: When do health inequalities start? Understanding the impact of childhood social adversity on health trajectories from birth to early adolescence” (POCI-01-0145-FEDER-029567; reference: FCT PTDC/SAU-PUB/29,567/2017). It is also supported by the Unidade de Investigação em Epidemiologia–Instituto de Saúde Pública da Universidade do Porto (EPIUnit) (reference UIDB/04750/2020), Administração Regional de Saúde Norte (Regional Department of Ministry of Health) and Fundação Calouste Gulbenkian; PhD grant SFRH/BD/108742/2015 (to SS) co-funded by FCT and the Human Capital Operational Programme (POCH/FSE Program); FCT Investigator contracts CEECIND/01516/2017 (to SF) and IF/01060/2015 (to ACS); and BEATRIZ GALINDO JUNIOR Spanish Grant (code BEAGAL18/00143) from MICINN (Ministerio de Ciencia, Innovación y Universidades), reference BGP18/00154 (to ALC). This study is also a result of the project DOCnet (NORTE-01-0145-FEDER-000003), supported by the Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement.Portugal. Ministério da Ciência, Tecnologia e Ensino Superior; FCT PTDC/DTP-EPI/1687/2014Portugal. Ministério da Ciência, Tecnologia e Ensino Superior; FCT PTDC/SAU-PUB/29,567/2017Universidade do Porto; UIDB/04750/202

    Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models

    Get PDF
    © 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article [López-Cheda, A., Cao, R., Jácome, M.A., Van Keilegom, I., 2017. Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Computational Statistics & Data Analysis 105, 144–165] has been accepted for publication in Computational Statistics & Data Analysis. The Version of Record is available online at https://doi.org/10.1016/j.csda.2016.08.002.[Abstract]: A completely nonparametric method for the estimation of mixture cure models is proposed. A nonparametric estimator of the incidence is extensively studied and a nonparametric estimator of the latency is presented. These estimators, which are based on the Beran estimator of the conditional survival function, are proved to be the local maximum likelihood estimators. An i.i.d. representation is obtained for the nonparametric incidence estimator. As a consequence, an asymptotically optimal bandwidth is found. Moreover, a bootstrap bandwidth selection method for the nonparametric incidence estimator is proposed. The introduced nonparametric estimators are compared with existing semiparametric approaches in a simulation study, in which the performance of the bootstrap bandwidth selector is also assessed. Finally, the method is applied to a database of colorectal cancer from the University Hospital of A Coruña (CHUAC).The first author’s research was sponsored by the Spanish FPU grant from MECD with reference FPU13/01371. The work of the first author has been partially carried out during a visit at the Université catholique de Louvain, financed by INDITEX, with reference INDITEX-UDC 2014. All the authors acknowledge partial support by the MINECO grant MTM2014-52876-R (EU ERDF support included). The first three authors’ research has been partially supported by MICINN Grant MTM2011-22392 (EU ERDF support included) and Xunta de Galicia GRC Grant CN2012/130. The research of the fourth author was supported by IAP Research Network P7/06 of the Belgian State (Belgian Science Policy), and by the contract “Projet d’Actions de Recherche Concertées” (ARC) 11/16-039 of the “Communauté française de Belgique” (granted by the “ Académie universitaire Louvain”). The authors would like to thank the Associate Editor and the three anonymous referees for their constructive and helpful comments, which have greatly improved the paper. The authors are grateful to Dr. Sonia Pértega and Dr. Salvador Pita, at the University Hospital of A Coruña, for providing the colorectal cancer data set.Xunta de Galicia; CN2012/13

    Estimating Lengths-Of-Stay of Hospitalized COVID-19 Patients Using a Non-parametric Model: A Case Study in Galicia (Spain)

    Get PDF
    [Abstract:] Estimating the lengths-of-stay (LoS) of hospitalised COVID-19 patients is key for predicting the hospital beds’ demand and planning mitigation strategies, as overwhelming the healthcare systems has critical consequences for disease mortality. However, accurately mapping the time-to-event of hospital outcomes, such as the LoS in the intensive care unit (ICU), requires understanding patient trajectories while adjusting for covariates and observation bias, such as incomplete data. Standard methods, such as the Kaplan-Meier estimator, require prior assumptions that are untenable given current knowledge. Using real-time surveillance data from the first weeks of the COVID-19 epidemic in Galicia (Spain), we aimed to model the time-to-event and event probabilities of patients’ hospitalised, without parametric priors and adjusting for individual covariates. We applied a non-parametric mixture cure model and compared its performance in estimating hospital ward (HW)/ICU LoS to the performances of commonly used methods to estimate survival. We showed that the proposed model outperformed standard approaches, providing more accurate ICU and HW LoS estimates. Finally, we applied our model estimates to simulate COVID-19 hospital demand using a Monte Carlo algorithm. We provided evidence that adjusting for sex, generally overlooked in prediction models, together with age is key for accurately forecasting HW and ICU occupancy, as well as discharge or death outcomes.ALC was sponsored by the BEATRIZ GALINDO JUNIOR Spanish from MICINN (Ministerio de Ciencia, Innovación y Universidades) with reference BGP18/00154. ALC, MAJ and RC acknowledge partial support by the MINECO (Ministerio de Economía y Competitividad) Grant MTM2014-52876-R (EU ERDF support included) and the MICINN Grant MTM2017-82724-R (EU ERDF support included) and partial support of Xunta de Galicia (Centro Singular de Investigación de Galicia accreditation ED431G 2019/01 and Grupos de Referencia Competitiva ED431C-2020-14 and ED431C2016-015) and the European Union (European Regional Development Fund - ERDF). PMD is a current recipient of the Grant of Excellence for postdoctoral studies by the Ramón Areces FoundationXunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2020/14Xunta de Galicia; ED431C 2016/01

    Nonparametric Estimation in Mixture Cure Models with Covariates

    No full text
    This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11749-022-00840-z[Abstract] Nonparametric estimation methods for the cure rate and the distribution of the failure time of uncured subjects with covariates for censored survival data have attracted much attention in the last few years. To model the effects of covariates on the distribution of the failure time of uncured subjects, existing works assume that the cure rate is a constant or depends on the same covariate as the distribution of uncured subjects. In this paper, we review the nonparametric estimation methods in the context of the mixture cure model and propose a new nonparametric estimator for the distribution of uncured subjects that relaxes the assumption used in the existing works. The estimation is based on the EM algorithm, which is readily available for mixture cure models, and is strongly consistent. The finite sample performance of the proposed estimator is assessed and compared with existing methods in a simulation study. Finally, the nonparametric estimation methods are employed to model the effects of some covariates on the time to bankruptcy among commercial banks insured by the Federal Deposit Insurance Corporation during the first quarter of 2006.ALC was sponsored by the BEATRIZ GALINDO JUNIOR Spanish grant from Ministerio de Ciencia, Innovación y Universidades with reference BGP18/00154. ALC and MAJ acknowledge partial support by the MINECO Grant MTM2017-82724-R (EU ERDF support included), the MICINN Grant PID2020-113578RB-I00, and partial support of Xunta de Galicia (Centro Singular de Investigación de Galicia accreditation ED431G 2019/01 and Grupos de Referencia Competitiva ED431C-2020-14 and ED431C2016-015) and the European Union (European Regional Development Fund - ERDF). YP’s work was partially supported by a Discovery grant from the Natural Sciences and Engineering Research Council of Canada. The authors thank Alessandro Beretta and Cédric Heuchenne for their assistance in obtaining the bank data analyzed in Sect. 7Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431C2016-01

    Rejoinder on: Nonparametric estimation in mixture cure models with covariates

    No full text
    This version of the article has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s11749-023-00871-0[Abstract]: We thank all discussants for their insightful comments on our paper [López-Cheda, A., Peng, Y. & Jácome, M.A. Nonparametric estimation in mixture cure models with covariates. TEST 32, 467–495 (2023). https://doi.org/10.1007/s11749-022-00840-z]. The comments include some suggestions on possible extensions and some potential issues and concerns in our current work. We respond to the comments as follows
    corecore